Interpreting BLEU/NIST Scores: How Much Improvement do We Need to Have a Better System?

نویسندگان

  • Ying Zhang
  • Stephan Vogel
  • Alexander H. Waibel
چکیده

Automatic evaluation metrics for Machine Translation (MT) systems, such as BLEU and the related NIST metric, are becoming increasingly important in MT. Yet, their behaviors are not fully understood. In this paper, we analyze some flaws in the BLEU/NIST metrics. With a better understanding of these problems, we can better interpret the reported BLEU/NIST scores. In addition, this paper reports a novel method of calculating the confidence intervals for BLEU/NIST scores using bootstrapping. With this method, we can determine whether two MT systems are significantly different from each other.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rule-Based Translation of Spanish Verb-Noun Combinations into Basque

This paper presents a method to improve the translation of Verb-Noun Combinations (VNCs) in a rule-based Machine Translation (MT) system for SpanishBasque. Linguistic information about a set of VNCs is gathered from the public database Konbitzul, and it is integrated into the MT system, leading to an improvement in BLEU, NIST and TER scores, as well as the results being significantly better acc...

متن کامل

Experiments on Language Normalization for Spanish to English Machine Translation

We describe a trained system we have built, using freely available statistical packages and present experiments specifically designed to improve the performance of the system in the Spanish into English task. We performed both qualitative and quantitative evaluations that show better perceived translation quality as well as better BLEU and NIST scores.

متن کامل

A Look inside the ITC-irst SMT System

This paper presents a look inside the ITC-irst largevocabulary SMT system developed for the NIST 2005 Chinese-to-English evaluation campaign. Experiments on official NIST test sets provide a thorough overview of the performance of the system, supplying information on how single components contribute to the global performance. The presented system exhibits performance comparable to that of the b...

متن کامل

Machine Translation on the Medical Domain: The Role of BLEU/NIST and METEOR in a Controlled Vocabulary Setting

The main objective of our project is to extract clinical information from thoracic radiology reports in Portuguese using Machine Translation (MT) and cross language information retrieval techniques. To accomplish this task we need to evaluate the involved machine translation system. Since human MT evaluation is costly and time consuming we opted to use automated methods. We propose an evaluatio...

متن کامل

Voting on N-grams for Machine Translation System Combination

System combination exploits differences between machine translation systems to form a combined translation from several system outputs. Core to this process are features that reward n-gram matches between a candidate combination and each system output. Systems differ in performance at the n-gram level despite similar overall scores. We therefore advocate a new feature formulation: for each syst...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004